CWM global search - an internet search engine for the chemist

نویسندگان

  • Alexander Kos
  • Hans-Jürgen Himmler
چکیده

The Internet is a rich source of data and information for chemist. There are numerous multidisciplinary databases available for free on the Internet. Some examples of such data repositories are: PubChem, ChemSpider, eMolecules, Drugbank, KEGG, NIST, ChemSynthesis, PharmGKB, Free patents online ... It should be obvious that an end user is a) not aware of all the resources, and b) has not the time to learn every user interface and is unable to search over all of them. We provide CWM Global Search as an application that enables to search by structure, CAS Registry Number and free text over all these sources. Presently CWM Global Search performs searches in 30 databases and search engines accessing more than 100 million pages that associate data with structures. The user can submit a single query structure or several using SDFiles. In addition to molecule searches CWM Global Search also allows to submit reaction queries. In that case several single molecule searches are performed for the reactants, reagents and products. This makes it easy to find commercial suppliers and other synthesis relevant information such as safety sheets in one query. Searching is technically less problematic than providing the answers in a digestible way for the user. Our first approach is to provide profiles for searching. You can choose “Availability” if you are interested to find a commercial supplier, or “Biology” if you are looking for biological effects. The second help comes when we display the summary of the results. You get a table with hyperlinks color coded by topics. If the result page of doing a search contains a link to an MSDS, the topic ‘Safety’ is highlighted. With profiles you limit your search to certain sources, and with topics your answers will be ordered. CWM Global Search is not the application for exact searches like “give me the melting point of anthracene”. You will find the melting point on many pages, and the topic “Physical Property” might help, but, in this case the link to Wikipedia gives the result quickest. Internet pages provide us the data in unordered fashion and finding the exact answers is time consuming. In some case like commercial suppliers we check against an internal list if the page really displays a supplier, or, if for instance PubChem has only a reference to ChemSpider, and ChemSpider references again just PubChem, but nowhere you will find a supplier. We also have to consider that many providers of the resources would not allow us to extract data directly without leading the user to their Internet pages. The nature of the results is fuzzy in CWM Global Search. This is an advantage if you look for instance for biological effects, which can be many, and/or if you want to learn why a compound could be important. We generate both InChI names and keys for the query structure and afterwards perform fast text searches in the various databases or search engines. In addition we also generate Smiles. Since prior to the existence of standard InChI’s (released 2009) the various database providers used different settings to generate the InChI’s stored in their database, we use multiple settings when generating an InChI identifier used for a structure search in CWM Global Search. This way we greatly maximize the chance to find an InChI independent of the settings used by the database provider.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

CWM Global Search - The Internet Search Engine for Chemists and Biologists

CWM Global Search is a meta-search engine allowing chemists and biologists to search the major chemical and biological databases on the Internet, by structure, synonyms, CAS Registry Numbers and free text. A meta-search engine is a search tool that sends user requests to several other search engines and/or databases and aggregates the results into a single list or displays them according to the...

متن کامل

Finding unusual peptides on the internet using plain three letter sequence codes

Finding peptides with modified amino acids is difficult or impossible when you use plain three letter sequence codes and BLAST. You can find those peptides when you use the structure as a query, but drawing the structure correctly is rather difficult for non-chemists. We developed CWM Global Search [1] with Proteax [2]. This is an Internet search engine that allows scientists such as biologists...

متن کامل

Getting to Know Wolfram|Alpha Computational Knowledge Engine and Its Applications in Biomedical Sciences

  Wolfram|Alpha Computational Knowledge Engine software, despite all internet search engines, tries to provide the the best answer for a question or compute an equation in the most correct way based on the current knowledge. Therefore, given the unique characteristic of Wolfram|Alpha and its vast applications, the aim of the present article is to familiarize the biomedical scientists with...

متن کامل

Discovering Popular Clicks\' Pattern of Teen Users for Query Recommendation

Search engines are still the most important gates for information search in internet. In this regard, providing the best response in the shortest time possible to the user's request is still desired. Normally, search engines are designed for adults and few policies have been employed considering teen users. Teen users are more biased in clicking the results list than are adult users. This leads...

متن کامل

Review of ranked-based and unranked-based metrics for determining the effectiveness of search engines

Purpose: Traditionally, there have many metrics for evaluating the search engine, nevertheless various researchers’ proposed new metrics in recent years. Aware of this new metrics is essential to conduct research on evaluation of the search engine field. So, the purpose of this study was to provide an analysis of important and new metrics for evaluating the search engines. Methodology: This is ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 2  شماره 

صفحات  -

تاریخ انتشار 2010